Ill II 1 11 II! Ill III] 111 I Hi II 

(11) EP 0 890 943 A3 



(12) EUROPEAN PATENT APPLICATION 



(88) 


Date of publication A3: 


(51) Int.CI 6 : G10L 9/14 




22.12.1999 Bulletin 1999/51 




(43) 


Date of publication A2: 


• 




13.01.1999 Bulletin 1999/02 




(21) 


Application number: 98112167.6 




(22) 


uate OT Tiling, ui.u/. lyy© 




(84) 


Designated Contracting States: 


(72) Inventor: Nomura, Toshiyuki 




AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


Minato-ku, Tokyo (JP) 




MCNLPT SE 






Designated Extension States: 


(74) Representative: 




AL LT LV WIK RO SI 


VOSSIUS & PARTNER 






Siebertstrasse 4 


(30) 


Priority: 1 1 .07.1997 JP 20247597 


81675 Munchen (DE) 


(71) 


Applicant: NEC CORPORATION 






Tokyo (JP) 





(54) Voice coding and decoding system 



(57) A first CELP coding circuit (14) receiving a sig- 
nal obtained by down -sampling of an input signal by a 
down-sampling circuit (1), outputs a part of coded out- 
put to a second CELP coding circuit. The second CELP 
coding circuit (15) encodes the input signal on the basis 
of the coded output of the first CELP coding circuit. A 
multiplexer (7) outputs the coded outputs of the first and 
second CELP coding circuits in a form of a bit stream/ A 
demultiplexer (18) outputs the coded output of the first 
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CELP coding circuit from the bit stream to a first CELP 
decoding circuit (16) when a control signal is low bit 
rate, and extracts a part of the output of the first CELP 
coding circuit and the output of the second CELP coding 
circuit to output to a second CELP decoding circuit (1 7) 
to output via a switch circuit (1 9) when the control signal 
is high bit rate. 
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(54) Scalable encoding of media streams 

(57) A scalable encoder (1 00) for encoding a media 
signal comprises first encoding means (21 0) for produc- 
ing a first data stream (1 02), which is a core data stream 
relating to the media signal (101), having a first bit-rate; 
second encoding means (230) for producing a second 
data stream (103), which comprises a set of enhance- 
ment data streams relating to the media signal, having 
a second bit-rate; and a multiplexer (1 1 0) for combining 
at least the first data stream and the second data stream 
into a third data stream (104). The scalable encoder is 



characterized in that it further comprises control means 
(420, 421 , 422), which is arranged to receive control in- 
formation (401), to determine a target combination of the 
first data stream and the second data stream in the third 
data stream according to the control information and to 
adjust the combination of the first data stream and the 
second data stream in the third data stream by affecting 
the first and the second bitrates. A multimedia terminal 
having a scalable encoder and a method for encoding 
data are also presented. 
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Description 

[0001] The invention relates in general to encoding of 
media streams. In particular the invention relates to 
scalable encoding methods and to scalable encoders. 
[0002] In general, media streams are encoded, in oth- 
er words compressed, before they are, for example, 
transmitted over a communication network or stored for 
further use. A media stream may be : for example, a vid- 
eo clip, which is a sequence of video frames, or an audio 
clip, which is typically digitized speech or music. In a 
multimedia application, for example, several media 
streams can be transmitted simultaneously. 
[0003] Using a suitable decoder it is possible to pro- 
duce from an encoded media stream a decoded media 
stream that is similarto, or exactly the same, as the orig- 
inal media stream, which was encoded. If the decoded 
media stream is the same as the original, the encoding 
is lossless. Most encoding methods cause losses to the 
encoded media stream. 

[0004] The term scalability refers to encoding a media 
stream into a compressed stream, which can be decod- 
ed at different data rates. Typically part of the encoded 
data stream is a core data stream, decoding of which 
produces a decoded media stream having a perceived 
quality, which is worse than the perceived quality of the 
original media stream. The encoded data stream further 
comprises other enhancement data streams, and if 
these are used in the decoding process in addition to 
the core data stream, the perceived quality of the de- 
coded media stream is enhanced. Because a scalable 
multimedia stream has these core and enhancement 
streams, it can be manipulated relatively easily while it 
is compressed so that it can be streamed over channels 
with different bandwidths and still be decoded and, fur- 
thermore, played back in real-time. 
[0005] Scalability is a desirable property for heteroge- 
neous and error prone environments. It is desirable in 
order to counter limitations such as constraints on trans- 
mission bit rate, network throughput, and decoder com- 
plexity. In multicast or broadcast transmission, for ex- 
ample, scalable encoding allows the various receivers 
to receive data at different data rates or to decode the 
transmitted encoded data stream with different decod- 
ers, which have a common core decoder. Furthermore, 
scalability can be used to improve error resilience in a 
transport system where scalable encoding is combined 
with transport prioritisation. Here the term transport pri- 
oritisation refers to various mechanisms to provide dif- 
ferent qualities of service in transport, including unequal 
error protection, to provide different channels having dif- 
ferent error/loss rates. Depending on their nature, data 
are assigned differently for example, the encoded core 
data stream may be delivered through a channel with a 
high degree of error protection, and the enhancement 
data streams may be transmitted through more error- 
prone channels. 

[0006] Figure 1 presents schematically a scalable en- 



coder 1 00 and a corresponding decoder 1 30. The media 
stream 1 01 is input to the scalable encoder 1 00, which 
produces a core data stream 1 02 and an enhancement 
data stream 103. Typically these data streams are fed 
5 to a multiplexer 1 1 0, which produces a scalable encod- 
ed data stream 104. This multiplexed data stream is 
then, for example, transmitted further or stored for fur- 
ther use. During decoding the scalable encoded data 
stream 104 is demultiplexed in a demultiplexer 1 '20 into 
w a core data stream 1 02 and possible enhancement data 
stream(s) 103. It is possible, for example, that the en- 
hancement data stream(s) is (are) not present in the re- 
ceived data stream 104, for example, due to limited 
transmission resources. The decoder 130 takes as in- 
15 puts the core data stream 102 and the possible en- 
hancement data stream(s) 1 03, and produces a decod- 
ed audio signal 1 05. The perceived quality of the decod- 
ed audio signal 105 typically depends on whether the 
enhancement data stream(s) 103 is (are) used in the 
20 decoding. It is also possible that a certain decoder can- 
not utilize particular enhancement data stream(s), but 
nevertheless it can decode the core data stream 102. 
[0007] Figure 2 shows schematically an example of a 
scalable audio encoder for encoding multimedia audio 
25 streams, which typically comprise speech and/or other 
audio signals. The scalable encoder 200 comprises a 
core encoder 210, which is, for example, specially de- 
signed for encoding speech. It may be, for example, 
3GPP AMR (Adaptive Multi-Rate) speech encoder 
30 which comprises various codecs operating at nominal 
rates between 4.75 - 12.2 kbit/s. The scalable encoder 
200 furthermore comprises an enhancement encoder 
230, which is designed for encoding general audio 
streams. The enhancement encoder can, for example, 
35 consist of MPEG-4 AAC audio encoder. The core en- 
coder 21 0 produces a core data stream 1 02 from an au- 
dio stream 101 . The core data stream 102 is fed to a 
core decoder 220, which decodes the core data stream 
and produces a decoded core data stream 201 . The dif- 
40 ference stream 202 is the difference between the origi- 
nal audio stream 1 01 and the decoded core data stream 
201 , and it is fed to an enhancement encoder 230 to- 
gether with the original audio stream 101. The original 
audio stream 101 is needed in enhancement encoder 
45 230 typically for determining the psych oacoustic model 
for quantiser bit allocation. The enhancement encoder 
230 produces an enhancement data stream 103. The 
core data stream 1 02 and the enhancement data stream 
103 are multiplexed into a scalable encoded data 
50 stream 1 04 in multiplexer 110. Figure 2 also shows core 
buffer 240 and enhancement buffer 250, which are the 
output buffers of the core and enhancement encoders. 
[0008] Figure 3 shows schematically a decoder 300 
corresponding to the scalable encoder 200. The scala- 
55 ble encoded data stream 104 is demultiplexed into a 
core data stream 102, which is fed to a core decoder 
220, and into a enhancement data stream 1 03, which is 
fed to an enhancement decoder 31 0. The core decoder 
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220 is typically similar to that present in the scalable en- 
coder 200 ; and it produces a decoded core data stream 
201 . The enhancement decoder 31 0 produces a decod- 
ed enhancement data stream 301 . which is combined 
with the decoded core data stream 201. The result is a 
decoded audio signal 105. 

[0009] Typically, but not necessarily ; the core speech 
encoder operates with lower bit rate and sampling fre- 
quency than the enhancement audio encoder. The sam- 
pling rates of the core and enhancement encoders may 
be the same or different, depending on what encoders 
are used. Usually the encoded enhancement data 
stream improves the perceived quality of the synthe- 
sized signal by adding the higher bandwidth signal com- 
ponents. 

[0010] In scalable audio multimedia encoders the 
core speech encoder typically operates at constant bit 
rate, possibly utilising voice activity detection (VAD) and 
discontinuous transmission (DTX). The enhancement 
layer encoder, on the other hand, typically operates at 
a variable rate. Target bit-rates for the core and en- 
hancement encoders are typically adjusted independ- 
ently based on the transmission rate of the transmission 
channel, where the transmission rate is typically a nom- 
inal transmission rate. To be able to transmit the scala- 
ble encoded data stream, the bit rate of this data stream 
should, of course, on average be less than the available 
transmission rate. 

[0011] Even though encoding algorithms effectively 
compress multimedia data, the limiting factor of the 
process, especially in terminals that operate over a radio 
interface, is transmission capacity, and therefore opti- 
mization of the use of this limited resource is very im- 
portant. Generally, scalable multimedia encoding suf- 
fers from a worse compression efficiency than non-scal- 
able encoding. In other words, a multimedia clip encod- 
ed as a scalable multimedia clip with all enhancement 
layers requires greater bandwidth than if it had been en- 
coded as a non-scalable single-layer clip with an equiv- 
alent perceived quality. Because of its numerous advan- 
tages, the use of scalable encoding is highly desirable 
and thus it would be advantageous if a method allowing 
more efficient use of available transmission capacity 
could be implemented. 

[0012] The core and enhancement data to be trans- 
mitted is temporarily stored in a multiplexer buffer, from 
where data chunks to be transmitted are extracted, for 
example, periodically. Typically the oldest data is ex- 
tracted from the multiplexer buffer, and the ratio of the 
bit-rates of the core and enhancement data stream de- 
termines the ratio of the core and enhancement data 
streams in the transmitted data flow. In this case it is 
possible, for example, that a variable rate audio encoder 
may produce such a large burst of data, that the trans- 
mission of this data burst causes delay jitter in the trans- 
mission of the core speech data. Alternatively, it is pos- 
sible to priorize the core (speech) data so that the en- 
hancement data stream is transmitted using transmis- 



sion capacity that is not used to transmit the core data 
stream. In this way it is possible to guarantee better that 
the core data stream is transmitted properly. 
[0013] The available space in the multiplexer buffer is 

5 determined by the bit-rates of the core and enhance- 
ment data streams, as data is inserted to the multiplexer 
buffer at an overall bit-rate equivalent to the sum of the 
core and enhancement data bit-rates, and by the trans- 
mission bit-rate, at which data is extracted from the mul- 

io tiplexer buffer. The multiplexer buffer has a certain size. 
Typically, at least one of the core and enhancement data 
steams has a variable rate, and therefore it is possible 
that a data burst fills the remaining multiplexer buffer 
space, or even cannot be stored entirely in the buffer. 

'5 This situation is called a multiplexer buffer overflow. Dy- 
namic changes in the instantaneous transmission rate 
are another example of possible cause of a multiplexer 
buffer overflow. If the transmission rate decreases for a 
certain time, data is extracted from the multiplexer buffer 

20 at a smaller rate for a while, and the occupancy of the 
multiplexer buffer increases, possibly leading to a mul- 
tiplexer buffer overflow. In a situation like this, if there is 
further a data burst, the risk of a multiplexer buffer over- 
flows increases further. It is possible to try to overcome 

25 the multiplexer buffer overflow problem using a larger 
multiplexer buffer, but this typically results in increased 
transmission delays. Furthermore, a large buffer is an 
inefficient way to solve the problem, as for most of the 
time the extra space is not required. 

30 [0014] An object of the invention is to provide a ver- 
satile method for scalable encoding of a multimedia data 
stream, a scalable encoder and a multimedia terminal 
comprising a scalable encoder. A further object of the 
invention is to provide a scalable encoding method, 

35 scalable encoder and multimedia terminal having a scal- 
able encoder, where risk of multiplexer buffer overflow 
can be significantly reduced. A further object is to pro- 
vide a scalable encoding method , scalable encoder and 
multimedia terminal having a scalable encoder, where 

40 the scalable encoded data stream can be adjusted to 
meet various and possibly dynamically changing cir- 
cumstances. 

[0015] These and further objects of the invention are 
achieved by determining a ratio of target bit- rates for the 

45 core data stream and enhancement data stream and, 
as long as the transmission rate allows, adjusting the 
core data stream and the enhancement data stream in 
such a way that the ratio is substantially maintained. 
[0016] A scalable encoder according to the invention 

50 is an encoder for encoding a media signal, which com- 
prises 

first encoding means for producing a first data 
stream, which is a core data stream relating to the 
55 media signal, having a first bit-rate, 

second encoding means forproducing a second da- 
ta stream, which comprises a set of enhancement 
data streams relating to the media signal, having a 
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second bit-rate, and 

a multiplexer for combining at least the first data 
stream and the second data stream into a third data 
stream, and it is characterized in that it further com- 
prises control means, which is arranged to receive 
control information, to determine a target combina- 
tion of the first data stream and the second data 
stream in the third data stream according to the con- 
trol information and to adjust the combination of the 
first data stream and the second data stream in the 
third data stream by affecting the first and the sec- 
ond bit- rates. 

[0017] A multimedia terminal according to the inven- 
tion comprises a scalable encoder comprising first en- 
coding means for producing a first data stream, which 
is a core data stream relating to the media signal, having 
a first bit-rate; second encoding means for producing a 
second data stream , which comprises a set of enhance- 
ment data streams relating to the media stream, having 
a second bit-rate; and a multiplexer for combining at 
least the first data stream and the second data stream 
into a third data stream, and it is characterized in that it 
further comprises a control unit, which is arranged to re- 
ceive control information, to determine a target combi- 
nation of the first data stream and the second data 
stream in the third data stream according to the control 
information and to adjust the combination of the first da- 
ta stream and the second data stream in the third data 
stream by affecting the first and the second bit-rates. 
[001 8] The invention relates also to a method for scal- 
able encoding a media signal, which method comprises 
the steps of: 

encoding the media signal into a first data stream, 
which is a core data stream corresponding to the 
media signal, having a first bit rate, 
encoding the media signal into a second data 
stream, which comprises a set of enhancement da- 
ta streams corresponding to the media signal, hav- 
ing a second bit rate, and 

multiplexing at least the first data stream and the 
second data stream into a third data stream, and 
which method is characterized in that it further com- 
prises the steps of: 

receiving control information, 

determining a target combination of the first data 
stream and the second data stream in the third data 
stream according to the control information, and 
adjusting the combination of the first data stream 
and the second data stream in the third data stream 
by affecting the first and the second bit-rates. 

[0019] Here the term control information refers to in- 
formation that is used in determining a target combina- 
tion of the core data stream and enhancement data 
stream in the combined encoded data stream. Possible 



changes in the transmission rate and in the bit-rates of 
the core (first) and enhancement (second) data streams 
cause the occupancy of the multiplexer buffer to 
change. Therefore, information indicating the occupan- 
5 cy of the multiplexer buffer is an example of control in- 
formation that may be used to provide a controlling feed- 
back for determining the target bit-rates for the core and 
enhancement streams. Other examples of control infor- 
mation are, for example, user preferences relating to the 
10 combination of the core and enhancement data 
streams. The user preference information can originate 
from the transmitting and/or receiving user/terminal. 
[0020] One of the main ideas in the invention is to de- 
termine a suitable combination for the core data stream 
15 and enhancement data stream jointly, instead of adjust- 
ing the target bit rates for these data streams independ- 
ently. By controlling data streams using, for example, 
the multiplexer buffer occupancy information, the oper- 
ation of the scalable encoders can be adjusted to the 
20 current purpose and, for example, to the condition of the 
transmission channel. Also the limited transmission ca- 
pacity is more optimally used compared with a solution, 
where only the bit rate of the enhancement data stream 
is adjusted or where the bit rates of the core and en- 
25 hancement data streams are adjusted independently. 
[0021] Furthermore, when the bit-rates of both the 
core and enhancement data stream are adjusted jointly, 
it is possible to sustain a given ratio between the bit- 
rates. On the other hand if, for example, a user prefers 
30 speech to audio, it is possible to reduce the bit rate of 
an audio stream significantly and to try to sustain the 
perceived quality of transmitted speech. Versatile scal- 
able encoding is thus possible by applying the invention. 
When encoding a media stream according to the inven- 
35 tion, the bit rate of either or both of the core or enhance- 
ment data streams can be adjusted, and therefore the 
available transmission capacity can be more exhaus- 
tively used. Due to this joint control of core and enhance- 
ment data streams, the danger of multiplexer buffer 
40 overflow will also decrease, and consequently the total 
buffer space can, in an optimal case, be reduced, there- 
by also decreasing the transmission delay. In a situation, 
where there is only a limited amount of space available 
in the multiplexer buffer, it is possible according to the 
45 invention, for example, to reduce the bit-rate of both the 
core and enhancement data streams, instead of only re- 
ducing the bit-rate of the enhancement data stream. 
[0022] A scalable encoder may produce a set of en- 
hancement data streams. In this case, the core data 
50 stream and the enhancement data streams forming the 
set of enhancement data streams are multiplexed into 
the scalable encoded data stream. The number of en- 
hancement data streams may be adjusted, for example, 
when the occupancy of a multiplexer buffer is above a 
55 certain threshold and/or the bit-rate allocated to each 
enhancement data may be adjusted. The bit-rates allo- 
cated for each enhancement data stream can be adjust- 
ed independently or, for example, the bit-rate allocated 
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for each enhancement data stream may be adjusted in 
a similar way. 

[0023] A scalable encoder according to the invention 
may be part of an encoding arrangement, where, for ex- 
ample, both audio signals and video signals are encod- 
ed. Such an encoding arrangement may comprise, for 
example, two scalable encoders (one for audio signal 
and one for video signal) or one non-scalable encoder 
and one scalable encoder. 

[0024] The appended dependent claims describe 
some preferred embodiments of the invention. 
[0025] The invention is described in more detail below 
with reference to preferred embodiments of the inven- 
tion and to the enclosed figures, in which 

Figure 1 shows schematically a scalable encoder 
and a corresponding decoder according to 
prior art, 

Figure 2 shows schematically a scalable encoder 
having a speech encoder and an audio en- 
coder according to prior art, 

Figure 3 shows schematically a prior-art decoder 
corresponding to the scalable encoder 
presented in Figure 2, 

Figure 4 shows schematically a scalable encoder 
according to a first preferred embodiment 
of the invention, 

Figure 5 shows schematically a scalable encoder 
according to a second preferred embodi- 
ment of the invention, 

Figure 6 shows schematically a control unit relating 
to a core encoder according to a third pre- 
ferred embodiment of the invention, 

Figure 7 shows schematically a control unit relating 
to a core encoder according to a fourth pre- 
ferred embodiment of the invention, 

Figure 8 shows schematically a control unit relating 
to an enhancement encoder according to 
a fifth preferred embodiment of the inven- 
tion, 

Figure 9 shows schematically a control unit relating 
to a core encoder and an enhancement 
encoder according to a sixth preferred em- 
bodiment of the invention, 

Figure 10 shows a flowchart corresponding to a 
method according to the invention, and 

Figure 11 shows schematically a scalable encoder 
for audio and video streams according to 



a seventh preferred embodiment of the in- 
vention.. 

Figure 12 shows schematically an H. 324 multimedia 
5 terminal according to the invention, and 

Figure 13 shows an example of a wireless multime- 
dia terminal according to the invention. 

10 [0026] Figure 1-3 were discussed in connection with 
the description of prior art scalable encoders. 
[0027] In the detailed description of the invention, one 
enhancement data stream is discussed as an example. 
It is possible that a scalable encoder according the in- 

15 vention produces a set of enhancement data streams 
comprising more than one enhancement data stream. 
[0028] Figure 4 shows schematically a scalable en- 
coder arrangement 400 according to a first preferred 
embodimentof the invention. It comprises a scalable en- 

20 coder 410 and a control unit 420, which is arranged to 
adjust the bit rates of the core data stream 1 02 and the 
enhancement data stream 103. The control unit 420 re- 
ceives control information 401 , which it uses in deter- 
mining a target combination of the core and enhance- 
rs ment data streams. Usually suitable target bit-rates, 
which can be target average bit-rates and/or target max- 
imum bit-rates, are determined for the core and en- 
hancement data streams. It is possible that the control 
unit 420, in addition to determining the target combina- 

30 tion, also monitors the current bit rates of the core data 
stream and enhancement data stream 402, 403 and, for 
example, adjusts the encoder so that the selected target 
bit rate is achieved. The current bit rates are typically 
measured using the output buffers 431 , 432 of the scal- 

35 able encoder 410. 

[0029] Figure 5 shows schematically a scalable en- 
coder arrangement 500 according to a second preferred 
embodiment of the invention . This scalable encoder 500 
comprises, as an example, a speech core encoder 210 

40 and an audio enhancement encoder 230. The speech 
encoder and the audio encoder are typically similar to 
the encoders presented in connection with Figure 2 . The 
speech encoder may be, for example, a variable rate 
speech encoder, or a multi-rate speech encoder having 

45 a certain set of available encoding algorithms producing 
encoded speech at different nominal bit-rates. A varia- 
ble rate speech encoder may be, for example a variable 
rate speech encoder as described in the document "Toll 
quality variable rate speech codec", Pasi Ojala, Pro- 

so ceedings of IEEE International Conference on Acous- 
tics, Speech and Signal Processing, Munich, Germany, 
April 1997. A multi-rate speech may be, for example, a 
3GPP AMR (Adaptive Multi-Rate) speech encoder. 
[0030] In Figure 5 two possible sources of control in- 

55 formation are shown as examples. It is possible to use 
information 401b about the occupancy level of the mul- 
tiplexer buffer 520 as control information. For example, 
when the transmission capacity of a transmission chan- 
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nel is dynamically changing or if the bit-rate of the en- 
hancement data stream increases suddenly for a certain 
period of time there may a danger of multiplexer buffer 
overflow, as discussed above in connection with the pri- 
or art description. According to the invention, when there 
is a danger of multiplexer buffer overflow, it is possible 
to reduce the target bit rate of the core data stream and/ 
or the enhancement data stream to reduce the bit-rate 
of the combined data stream. 

[0031] Figure 5 also presents input element 510 for 
receiving preference information 501 . The input element 
is typically a part of a multimedia terminal; of which the 
encoder 500 is a part. The preference information 501 
provided to the input element 510 can originate from 
many different sources. The input can come from the 
user of the transmitting terminal, wherein the input ele- 
ment is typically part of the user interface of the multi- 
media terminal. The term user interface means, for ex- 
ample, a combination of a keyboard, a screen and ap- 
propriate software to transform the commands given by 
the user into a formatted preference indication. The pref- 
erence in such a solution can also be adjusted e.g. with 
the help of a slide switch, where positioning the switch 
at one end of its scale means full preference for high 
quality voice and positioning the switch at the opposite 
end means full preference for high quality audio, and 
positioning the switch somewhere in between indicates 
the direction of trade-off between speech and audio. 
The input can also come from some external source, e. 
g. from the receiving user, wherein the input element is 
a part of the receiving functions of the multimedia termi- 
nal. The control input can be received, for example, as 
part of call control or as in-band signalling. The informa- 
tion can be provided at the beginning of communication 
of updated during communication. Furthermore, it is 
possible that certain preset values indicating appropri- 
ate combinations of core and enhancement data 
streams are stored in the multimedia terminal or in the 
encoder itself . These preset values can be : for example, 
dependent on the transmission channel bit-rate. 
[0032] The preference information 501 indicates the 
preferred combination of the core and enhancement da- 
ta streams 102, 103 in the scalable encoded data 
stream 104, and the possible options comprise any 
combination from full subsidiarity (0%) to full preference 
(100%) to one bit-stream, including any trade-off com- 
bination therebetween. The preference information 501 
is transformed into control information 401a, and this 
control information 401 a is input to the speech and audio 
bit-rate control units 421 , 422. The speech bit-rate con- 
trol unit 421 and the audio bit-rate control unit 422 is 
arranged to adjust the target bit-rates of encoding ac- 
cording to the preferred proportions set by the prefer- 
ence indication. After this the encoders 210, 230 are ar- 
ranged to operate on said target bit-rate levels. Typical- 
ly, if the preference is on high speech quality, the control 
information 401a causes the control units 421, 422 to 
set a relatively high target bit-rate for the core encoder 



and a relatively low target bit-rate and for the audio en- 
coder. The target bit rates can be average bit-rates and/ 
or maximum bit-rates. The ways in which an encoder is 
arranged to adjust the bit-rate are discussed in more de- 
5 tail below. 

[0033] In a situation where it is expected that a termi- 
nal receiving the scalable encoded data stream is able 
to decode only the core data stream, it is preferable for 
the core data stream to have a higher bit-rate than the 

10 enhancement data stream. It is possible, for example, 
that before the actual encoding of data begins the ter- 
minals involved in a multimedia session inform each oth- 
er of their encoding and decoding capabilities. This in- 
formation can be used as control information. It is rea- 

15 sonable to give higher priority to the control information 
received from a receiving terminal/userthan that coming 
from the transmitting terminal/user. 
[0034] As Figure 5 shows, the current bit rate 402 of 
the core data stream or the current bit rate 403 of the 

20 enhancement data stream can be monitored and used 
in adjusting the core and enhancement encoders. Fur- 
thermore, arrow 502 in Figure 5 illustrates possible in- 
formation exchange between the control units 421 and 
422. The joint control of the target bit-rates for core and 

25 enhancement data streams can be implemented as 
separate control units, which communicate with each 
other, or alternatively as separate control units with a 
common logic enabling joint control of the bit-rates. Fur- 
thermore, it is possible to have a single control unit, 

30 which selects the target bit-rates and monitors the cur- 
rent bit-rates of both the core encoder and the enhance- 
ment encoder. 

[0035] An encoder according to the invention can be 
implemented in hardware, in software, or as a combina- 

35 tion of hardware and software. It may be, for example, 
a computer program comprising computer program 
code means adapted to perform necessary steps (for 
example, the steps of a method according to the inven- 
tion) when said program is run on a computer. The com- 

40 puter program may be embodied on a computer reada- 
ble medium. 

[0036] Figure 6 shows schematically a control unit 
421 relating to a variable rate core encoder 21 0 accord- 
ing to a third preferred embodiment of the invention. The 

45 variable rate encoder may be, for example, a variable 
rate speech encoder. For control purposes, the bit-rate 
of the core data stream 1 02 from the core encoder 21 0 
is monitored and fed to a feed-back filter 601 , where it 
is averaged to smoothen short term variations in the bit- 

50 rate. The estimated average bit-rate 611 obtained in this 
way is subtracted from the target bit-rate 61 2 of the core 
encoder 210 to derive an error signal 613 that is fed to 
a controller 603 that generates a control signal 614 for 
the core encoder 210. The encoding algorithm used in 

55 the speech encoder is adjusted according to the control 
signal received from the controller 614. The details of 
the adjustment depend on the encoding algorithm: typ- 
ically, for example, the quantization of the coefficients 



6 



BNSDOC1D: <EP 1 173028A2_I_> 



11 



EP 1 173 028 A2 



12 



representing the original media signal is adjusted. In the 
controller 603 : any control algorithm or logic can be 
used. For example. PI (Proportional Integral) type of 
control, generally known to a person skilled in the art, is 
possible. The target bit-rate 61 2 is determined in a target 
bit rate determination unit 602 ; which is also part of the 
control unit 421 . The control information 401 affects the 
determination of the target bit-rate, which is typically de- 
termined jointly with the target bit-rate of the enhance- 
ment encoder 230. Arrow 502 in Figure 6 illustrates the 
exchange of information between these control units. 
[0037] The function of the control loop is substantially 
to drive the estimated average bit-rate 611 to follow the 
given target bit-rate 612, and the input signal 101 can 
be considered as a disturbance to the control-loop. For 
example in the case of a source controlled variable-rate 
speech encoder, the bit-rate is selected using adaptive 
thresholds. The control signal 614 from the controller 
603 can be used as a tuning factor for the selection of 
an adaptive threshold for the speech encoder 210. More 
detailed description of the embodied use of adaptive 
thresholds for controlling the bit-rate can be found e.g. 
in the document "Toll quality variable-rate speech co- 
dec", Pasi Ojala, Proceedings of IEEE International 
Conference on Acoustics, Speech and Signal Process- 
ing; Munich, Germany, April 1 997. In addition to the con- 
trol of the average bit-rate, the maximum bit-rate of the 
speech encoder can also be controlled by limiting the 
use of codebooks requiring the highest bit-rates. Apply- 
ing control of the average bit-rate and for the maximum 
bit-rate of the encoder, the bit-rate of the encoded core 
data stream 102 can be targeted to a given level. 
[0038] Figure 7 shows schematically a control unit 

421 relating to a core encoder 21 0 according to a fourth 
preferred embodiment of the invention. Here the core 
encoder 210 is a multi-rate encoder, which comprises a 
set of separate encoding algorithms, each producing 
encoded speech at a certain bit rate. The control infor- 
mation 401 is fed to a target bit-rate determination unit 
602, where the target bit rate for the core encoder 210 
is determined. It is determined jointly with the target bit 
rate for the enhancement data stream. Arrow 502 in Fig- 
ure 7 illustrates the exchange of information between 
the core control unit 421 and the enhancement control 
unit 422. The determined target bit-rate 612 is fed to an 
encoding mode selection unit 701 , which selects a suit- 
able encoding algorithm and transmits control signal 
711 indicating the selected encoding algorithm to the 
core encoder 210. 

[0039] Figure 8 shows schematically a control unit 

422 relating to an enhancement encoder 230 according 
to a fifth preferred embodiment of the invention. The en- 
hancement encoder is typically a variable rate encoder. 
It is possible, for example, to monitor the average bit- 
rate of the enhancement data stream 103 using a filter 
801 , which smoothens short-term variations in the bit- 
rate and produces an estimated average bit-rate 811 A 
target bit-rate 812, which is selected in a target bit-rate 



selection unit 802 jointly with the target bit-rate for a core 
encoder (see arrow 502 in Figure 8) and using control 
information 401 . is fed together with the average bit-rate 
811 to a bit-rate adjustment unit 803. Typically the output 

5 bit-rate of a variable rate audio encoder is adjusted., for 
example., by selecting a suitable quantization accuracy 
for the frequency domain transform coefficients, which 
the audio encoder produces. It is also possible to adjust 
the output bit rate by adjusting the audio bandwidth. The 

io term audio bandwidth means the frequency range of the 
audio signal to be encoded. It can be, for example 0 - 
1 2 kHz or 0 - 1 6 kHz. By increasing the audio bandwidth, 
the number of frequency domain coefficients required 
to represent the audio signal increases. 

15 [0040] Figure 9 shows schematically a control unit 
420 relating to a core encoder 210 and to an enhance- 
ment encoder 230 according to a sixth preferred embod- 
iment of the invention. Here the core encoder 21 0 com- 
prises a set of available encoding algorithms producing 

20 encoded speech at various bit rates. The control unit 
420 comprises a rate determination algorithm (RDA) 
unit 901 , wherethe content of the signal 1 01 is analyzed. 
The rate determination algorithm described here analy- 
ses the speech content of an audio signal, but it is pos- 

25 sible to use any signal content analyzer. The rate deter- 
mination algorithm unit 901 selects the encoding algo- 
rithm, which produces an encoded enhancement data 
stream having the smallest bit-rate while still providing 
adequate audio quality. It is possible, for example, to use 

30 long-term periodicity and prediction gains as selection 
factors. Long-term periodicity refers to fundamental fre- 
quencies present in the signal; periodic signals give h igh 
long-term prediction gain and typically indicate voiced 
sounds. To achieve good quality, accurate coding of the 

35 periodic components is required. This typically means 
the selection of an encoding algorithm producing encod- 
ed speech at a relatively high bit-rate. On the other hand, 
low long-term prediction gain typically indicates non- 
voiced sound, and long-term coding is typically not re- 

40 quired. This means that a lower bit-rate is required to 
accurately represent the signal to be encoded. Short- 
term prediction is another technique commonly used in 
the encoding of audio data, specifically speech data, 
and it typically involves modeling of the signal spectrum 

45 (frequency spectrum) using linear prediction coding 
(LPC). A good LPC fit usually indicates that the signal 
contains speech and thus requires a high core bit rate 
to achieve good speech quality. 

[0041 ] Furthermore, it is possible, for example, to use 
so the signal-to-noise ratio (SNR) of the decoded core data 
stream as a core encoding algorithm selection factor. 
For example, all encoding algorithms may be run in par- 
allel and the one producing the best SNR is selected. In 
addition, it is possible to use signal energy and frequen- 
55 cy content in selecting a suitable encoding algorithm or 
target bit rate for the core encoder. 
[0042] Typically the bit-rates of the core and enhance- 
ment data stream are adjusted independently of each 



7 



BNSDOCID: <EP 1 1 73028A2J_> 



13 



EP 1 173 028 A2 



14 



r 



other once the target bit-rates for the data streams have 
been determined jointly. Although the target bit-rates are 
determined jointly, it is possible to change the target bit- 
rate (81 2) of an enhancement data stream, for example, 
more often than that (612) of the core data stream. This 5 
may be advantageous, for example, when the enhance- 
ment encoder is a variable rate-encoder and the core 
encoder is a multi-rate encoder. 

[0043] Figure 10 shows, as an example, a flowchart 
corresponding to a method according to the invention, 10 
in which the target bit-rate of a core encoder and the 
target bit-rate of an enhancement encoder are deter- 
mined jointly using control information. Step 1001 is per- 
formed when encoding is begun to determine initial con- 
trol information. The initial control information may be, 
for example, a preset default setting or it may originate, 
for example, from the transmitting user/terminal or from 
the receiving user/terminal. In step 1 002 the target com- 
bination of the core data stream and the enhancement 
data stream is determined according to the initial control 20 
information. Thereafter during the encoding process, 
the loop formed by steps 1 003-1 01 0 is executed. In step 
1003, the availability of multiplexer buffer space is de- 
termined. If there is enough available buffer space (for 
example, the buffer occupancy is less than a certain first SB 
threshold T-, in step 1004), target bit-rates for the core 
and enhancement data streams are determined in step 
1005 using, for example, RDA information and user 
preference information as control information. For ex- 
ample, it is possible to estimate the actual transmission 3< 
rate from the buffer occupancy and the bit-rates of the 
core and enhancement data streams. The RDA deter- 
mines a certain bit rate, for example, for the core bit 
stream, and as long as the transmission rate is probably 
large enough for transmitting a data stream having a 3 
core data stream, whose bit-rate is that determined by 
the RDA, and an enhancement data stream, whose bit- 
rate is determined by preference information indicating 
a ratio of the target bit-rates for the core and enhance- 
ment data streams, the bit-rates determined using RDA 4 
can be allowed. If the RDA suggests too large a bit-rate, 
then it is possible to sustain the preferred bit-rate ratio 
by reducing the bit-rates of the core and enhancement 
streams accordingly. If the RDA suggests such a low bit- 
rate for the core data stream that part of the available - 
transmission capacity would be left unused assuming 
the given combination of the core and enhancement da- 
ta streams, it is possible to select a higher target bit-rate 
for the core data stream and, respectively, for the en- 
hancement data stream. 

[0044] If the buffer occupancy is, for example, over 
the first thresholdT 1 (indicating that the risk of multiplex- 
er buffer overflow is increased) but below a second 
threshold T 2 (step 1006), the bit-rates of the core and/ 
or enhancement data streams may be limited according 
to, for example the user preference information or de- 
fault setting information, in step 1007 by adjusting the 
target bit-rates of the core and/or enhancement data 



streams. If the multiplexer buffer occupancy exceeds 
the second threshold T 2 (step 1008) indicating that the 
multiplexer buffer is subtantially full, the bit rates of the 
core and enhancement data streams are restricted fur- 
ther in step 1 009. This may mean, for example, that the 
enhancement encoder is not used at all and, for exam- 
ple in the case of a multi-rate core encoder, the encoding 
algorithm producing a core data stream having the 
smallest bit rate is selected. 

[0045] When the potential overflow situation has 
passed, the occupancy level of the buffer decreases. 
This means that at some point after the potential over- 
flow situation the occupancy level of the multiplexer buff- 
er is below T 2 , and the target bit-rates for the core and 
enhancement data streams can be adjusted according 
to the preferred bit-rate ratio. Furthermore, when the oc- 
cupancy level of the multiplexer buffer is below T 1 , it may 
be possible to use a target bit-rate determined by RDA 
for the core data stream. 

[0046] It is also possible that the user preference in- 
formation or other control information is provided or up- 
dated during encoding process. In that case, the target 
combination of the core data stream and the enhance- 
ment data stream is determined according to the pro- 
vided/updated control information in the loop comprising 
steps 1003-1009. 

[0047] After the target bit-rates for the core and en- 
hancement data streams are determined in step 1005 
(or in step 1007 or 1009), the bit-rates of the core and 
enhancement data streams are adjusted in step 1010 
according to the determined target bit-rates. In step 
1010 the bit-rates can be adjusted using, for example, 
arrangements presented in Figure 7-9. Typically this ad- 
justment of the bit-rates is a continuous activity, which 
; goes on also during the execution of steps 1 003-1009. 
The target bit-rates for the adjustment are updated 
(steps 1005, 1007, 1009), for example, every time infor- 
mation about the occupancy of a multiplexer buffer is 
received. 

> [0048] Figure 10 does not show explicitly the receipt 
of the data to be encoded, the actual encoding or the 
multiplexing of the core and enhancement data streams 
into a combined data stream. These are, however, all 
typically carried out in a method according to the inven- 

5 tion. 

[0049] There are also scalable video encoders, which 
typically comprise base layer (core) encoding and en- 
hancement layer encoding implemented in a single en- 
coder. Thus Figure 4 also schematically presents a typ- 
0 ical scalable video encoder according to the invention. 
A video sequence consists of a series of still pictures, 
which are displayed consecutively, each frame separat- 
ed from the other by a certain interval of time. Video 
compression/encoding methods are based on reducing 
55 redundant and perceptually irrelevant parts of video se- 
quences. The redundancy in video sequences can be 
categori2ed into spatial, temporal and spectral redun- 
dancy. The term spatial redundancy refers to the corre- 
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lation between neighboring pixels within an image. Tem- 
poral redundancy refers to the similarity between con- 
secutive pictures in a video sequence. Reducing the 
temporal redundancy reduces the amount of data re- 
quired to represent a particular image sequence and 
thus compresses the data. This can be achieved by gen- 
erating motion compensation data, which describes the 
motion between the current and a previous (reference) 
picture. In effect, the current picture is predicted from 
the previous one. The term spectral redundancy refers 
to the correlation between the different color compo- 
nents of the same picture. 

[0050] Scalable video encoding may use temporal 
scalability, signal-to-noise ratio scalability or spatial 
scalability. Temporal scalability provides a mechanism 
for enhancing perceptual quality by increasing the pic- 
ture display rate. This is achieved by taking a pair of 
consecutive reference pictures and predicting additional 
pictures from either one or both of them. The additional 
predicted pictures can then be played in sequence be- 
tween the two reference pictures. The additional predict- 
ed pictures are not used as reference pictures them- 
selves, that is other pictures are never predicted from 
them or otherwise encoded using them. Thus, they can 
be discarded without impacting the picture quality of fu- 
ture pictures, and therefore they provide temporal scal- 
ability. Spatial scalability and SNR scalability are closely 
related, the only difference being the increased spatial 
resolution provided by spatial scalability. SNR scalability 
implies the creation of multi-rate bit streams. It enables 
the recovery of coding errors, or differences between an 
original picture and its reconstruction from the base lay- 
er data stream. This is achieved by using a finer quan- 
tizer to encode a difference picture in an enhancement 
layer. This additional information increases the SNR of 
the overall reproduced picture. 

[0051] Spatial scalability allows for the creation of 
multi-resolution bit streams to meet varying display re- 
quirements and/or constraints. It is essentially the same 
as in SNR scalability except that a spatial enhancement 
layer attempts to recover the coding loss between an 
up -samp led version of the reconstructed reference layer 
picture and a higher resolution version of the original 
picture. For example, if the reference layer has a quarter 
common intermediate format (QCIF) resolution 
(176x144 pixels), and the enhancement layer has a 
common intermediate format (CIF) resolution (352x288 
pixels), the reference layer picture must be scaled ac- 
cordingly such that the enhancement layer picture can 
be predicted from it. The QCIF standard allows the res- 
olution to be increased by a factor of two in the vertical 
direction only, the horizontal direction only, or both the 
vertical and horizontal directions for a single enhance- 
ment layer. Thus, there can be multiple enhancement 
layers, each increasing the picture resolution over that 
of the previous layer. 

[0052] In scalable video encoders the enhancement 
data stream typically comprises additional predicted 



frames (temporal scalability) and/or additional informa- 
tion about the coefficients describing the original frame. 
In a scalable video encoder according to the invention, 
the accuracy of the base layer frame and the accuracy 
5 of the enhancement layer frame is typically adjusted by 
adjusting quantization of the coefficients or, in temporal 
scalability, also by adjusting the number of additional 
predicted frames. 

[0053] Figure 11 shows schematically an example of 
10 an arrangement comprising two scalable encoders, en- 
coder 500 for encoding an audio signal 101a and en- 
coder 400 for encoding a video signal 1 01b, according 
to a seventh preferred embodiment of the invention. In 
this embodiment, control information is delivered to con- 
's trol units 420, 421 , 422 of both scalable encoders. The 
control information may indicate, for example, a user 
preference between audio and video streams and/or 
fine tuning preferences for the scalable audio encoder 
500 and for the scalable video encoder 400. 
20 [0054] Figure 1 2 shows a functional block diagram of 
a multimedia communication terminal 20 according to 
the invention. As an example, the multimedia terminal 
20 is an H.324 multimedia terminal. An H.324 compati- 
ble multimedia communication system, as shown in Fig- 
25 ure 12, consists of a terminal unit 20, an interface unit 
31 , a GSTN (General Switched Telephone Network) 
network 32, and a multipoint control unit (MCU) 33. H. 
324 implementations are not required to have each 
functional element. Mobile terminals may be implement- 
30 ed with any appropriate wireless interface as an inter- 
face unit 31 (as specified in H.324 AnnexC). In this case 
the network is a PLMN (Public Land Mobile Network) 
rather than a GSTN. 

[0055] The MCU 33 works as a bridge, that centrally 

35 directs the flow of information in the GSTN network 32 
to allow communication among several terminal units. 
The interface unit 31 converts the multiplexed bit-stream 
into a signal that can be transmitted overthe GSTN, and 
converts the received signal into a bit-stream that is sent 

40 to the multiplex/demultiplex protocol unit 21 of the ter- 
minal 20. The multiplex protocol multiplexes encoded 
media, data and control streams into a single bit-stream 
for transmission, and demultiplexes a received bit- 
stream into various media streams. In addition, it per- 

45 forms logical framing, sequence numbering, error de- 
tection, and error correction e.g. by means of retrans- 
mission, as appropriate to each media type. The control 
protocol 22 of the system control 26 provides end-to- 
end signaling for operation of the multimedia terminal, 

50 and signals all other end-to-end system functions. It pro- 
vides for capability exchange, signaling of commands 
and indications, and messages to open and fully de- 
scribe the content of logical channels. The data proto- 
cols 23 support data applications 27 such as electronic 

55 whiteboards, still image transfer, file exchange, data- 
base access, audiograph ics conferencing, remote de- 
vice control, network protocols etc. The scalable encod- 
er 500 according to the invention encodes the audio 
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and/or video signal from the media I/O equipment 28 for 
transmission. The media I/O equipment typically com- 
prises a microphone and a loudspeaker for the capture/ 
reproduction of audio signals and a display and a digital 
camera for the capture/reproduction of video signals. 
The scalable encoder 500 typically receives information 
about the occupancy level of a multiplexer buffer 520 
which is a part of the multiplexer unit 22. Typically there 
is also a corresponding decoder unit, but it is not shown 
in Figure 12. The decoded media signal is presented to 
the user using the media I/O equipment. A multimedia 
terminal according to the invention comprises at least 
one scalable encoder according to the invention for en- 
coding a signal from media I/O equipment. A multimedia 
terminal according to the invention may comprise a scal- 
able encoder arrangement for encoding audio and video 
signals, as illustrated in detail by the example in Figure 
11. 

[0056] It is also possible thai in encoding and decod- 
ing algorithms, which do not correspond to each other, 
are used. In other words, it is possible to use one en- 
coding algorithm in one direction of a bi-directional mul- 
timedia connection and a second encoding algorithm in 
the other directions. It is alternatively possible that a 
multimedia connection is unidirectional, as for example 
in multimedia streaming where a multimedia data 
stream is retrieved from a source resident in a network 
and is decoded and played back at a receiving multime- 
dia terminal. In this case an encoder according to the 
invention would be located in the network. 
[0057] Figure 13 illustrates the functional modules of 
an embodiment for a wireless multimedia terminal 1300 
according to the invention. A Central Processing Unit 81 
controls the blocks responsible for the mobile station's 
various functions: a Memory (MEM) 82, a Radio Fre- 
quency block (RF) 83, a User Interface (Ul) 84 and an 
Interface Unit (IU) 85. The CPU is typically implemented 
with one or more functionally inter-working microproc- 
essors. The memory preferably comprises a ROM 
(Read Only Memory), a RAM (Random Access Memo- 
ry) and is generally supplemented with memory sup- 
plied with a SIM User Identification Module. In accord- 
ance with its program, the microprocessor uses the RF 
block 83 for transmitting and receiving signals on a radio 
path. Communication with the user is managed via the 
Ul 84, which typically comprises a loudspeaker, a mi- 
crophone, a display and a keyboard. The Interface Unit 
85 provides a link to a data processing entity, and it is 
controlled by the CPU 81 . The data processing entity 
may be e.g. an integrated data processor or external da- 
ta processing equipment, such as a personal computer. 
The mobile terminal according to the invention also 
comprises at least one scalable encoder according to 
the invention; in Figure 13 a scalable encoder 500 is 
shown. Typically a mobile terminal according to the in- 
vention also comprises a corresponding decoder. The 
mobile termi nal also comprises a multiplexer 88 for gen- 
erating a composite data-stream comprising the core 



and enhancement data-streams output by the scalable 
encoder and control information. It also generates de- 
composed data-streams for decoding from the received 
data-stream. The multiplexer is arranged to output the 
5 encoded multiplexed bit-streams into a multiplexer buff- 
er 520. The scalable encoder 500 comprises control 
means, which is typically connected by a control data 
feedback loop to control the operations of the encoding 
processes and receives information about the occupan- 
10 cy level of the multiplexer buffer 520 as described in con- 
nection with Figure 5. Although only two data-streams 
are presented in Figure 13, more than two bit-streams 
(e.g. control data, data for data applications, etc. as 
shown in Figure 12) can also be involved. A target bit- 
15 rate for each data stream is set according to, for exam- 
ple, the preference information received by the terminal, 
and a policy for making adjustments to those targets in 
case of multiplexer buffer overflow is defined, in a man- 
ner described earlier. 
20 [0058] The input element 51 0 in a mobile terminal can 
be arranged to receive preference information through 
the user interface 84 as described in Figures 5 and 13. 
The input element 510 in a mobile terminal can also be 
arranged to receive preference information from the ter- 
25 minal, with which it is communicating, using control sig- 
nals provided by the communication protocol used be- 
tween the two terminal entities. The latest ITU-T (ITU 
Telecommunication Standardization Sector) video- 
phone standards, such as ITU-T H.324 and H.323 use 
30 the H.245 control protocol to initialize a connection, i.e. 
open logical channels, exchange capability sets etc. 
This control protocol can also be used to send com- 
mands and indications during the connection, and these 
can be used to convey control information relating to the 
35 preferences of a receiving user/terminal to a transmit- 
ting terminal (see unit 510 in Figure 12). 
[0059] Although the invention has been illustrated and 
described in terms of a preferred embodiment, those 
persons of ordinary skill in the art will recognize modifi- 
40 cations to the preferred embodiment may be made with- 
out departure from the scope of the invention as claimed 
below. 



45 Claims 

1 . A scalable encoder (1 00) for encoding a media sig- 
nal, which comprises 

so - first encoding means (21 0) for producing a first 

data stream (102), which is a core data stream 
relating to the media signal (1 01 ), having a first 
bit- rate, 

second encoding means (230) for producing a 
55 second data stream (103), which comprises a 

set of enhancement data streams relati ng to the 
media signal, having a second bit-rate : and 
a multiplexer (110) for combining at least the 



10 



BNSDOCID: <EP 11 73028A2_I_> 



19 



EP 1 173 028 A2 



20 



first data stream and the second data stream 
into a third data stream (104), characterized 
in that it further comprises control means (420, 
421 s 422), which is arranged to receive control 
information (401), to determine a target combi- 
nation of the first data stream and the second 
data stream in the third data stream according 
to the control information and to adjust the com- 
bination of the first data stream and the second 
data stream in the third data stream by affecting 
the first and the second bit-rates. 

2. A scalable encoder according to claim 1 , charac- 
terized in that at least one of the first and second 
encoding means is a variable rate encoding means. 

3. A scalable encoder according to claim 2, charac- 
terized in that the control means has means (602, 
802) for determining a target bit-rate at least for the 
data stream produced by said one of the first and 
second encoding means and is arranged to adjust 
the bit-rate of said data stream. 

4. A scalable encoder according to claim 2, charac- 
terized in that the control means further comprises 
a feedback loop (601 , 801 ), comparison means and 
a controller unit (603, 803); 

said feedback loop arranged to transfer infor- 
mation on an estimated actual bit-rate of said 
data stream to the comparison means; 
said comparison means being supplied with a 
target bit- rate ..arranged to calculate the differ- 
ence between the estimated actual bit-rate of 
said data stream and target bit-rate and to pro- 
vide the calculated difference to the controller 
unit; 

said controller unit being arranged to output a 
control signal to said one of the first and second 
encoding means, as a response to receiving 
said calculated difference; and 
said one of the first and second encoding 
means being arranged to adjust the bit-rate of 
said data stream according to the received con- 
trol signal from the controller unit. 

5. A scalable encoder according to claim 4, charac- 
terized in that said one of the first and second en- 
coding means is arranged to adjust quantization of 
coefficients representing the media signal accord- 
ing to the control signal. 

6. A scalable encoder according to claim 4, charac- 
terized in that said one of the first and second en- 
coding means is the first encoding means, which is 
a variable rate speech encoder. 

7. A scalable encoder according to claim 4, charac- 



terized in that said one of the first and second en- 
coding means is the second encoding means, 
which is a variable rate audio encoder. 

5 8. A scalable encoder according to claim 7, charac- 
terized in that the variable rate audio encoder is 
arranged to determine a bandwidth for the media 
signal according to the control signal. 

io 9. A scalable encoder according to claim 1 , charac- 
terized in that at least one of the first and second 
encoding means is a multi-rate encoding means 
having a set of available encoding algorithms. 

^5 10. A scalable encoder according to claim 9, charac- 
terized in that the control means has means (602) 
for determining a target bit-rate for at least the data 
stream produced by said one of the first and second 
encoding means, means (701 , 901 ) for selecting an 

20 encoding algorithm among said set of encoding al- 

gorithms and for indicating said selected encoding 
algorithm to said one of the first and second encod- 
ing means, which is arranged to use the indicated 
encoding algorithm. 

25 

11. A scalable encoder according to claim 10, charac- 
terized in that said means for selecting an encod- 
ing algorithm comprise rate determination means 
(901). 

30 

12. A scalable encoder according to claim 9, charac- 
terized in that said one of the first and second en- 
coding means is the first encoding means, which is 
a multi-rate speech encoder. 

35 

13. A scalable encoder according to any of the preced- 
ing claims, characterized in that it further compris- 
es means (602, 802) for determining jointly a first 
target bit-rate for the first data stream and a second 

40 target bit-rate for the second data stream according 
to said control information. 

14. A scalable encoder according to claim 13, charac- 
terized in that it further comprises a multiplexer 

45 buffer (520) for storing data from the multiplexer for 
transmission, and in that said multiplexer buffer is 
connected to the control means for delivering con- 
trol information (401 b) indicating the occupancy lev- 
el of said multiplexer buffer, said occupancy level 

so indicating the current amount of data stored in the 

multiplexer buffer. 

15. A scalable encoder according to claim 14, charac- 
terized in that the means (602, 802) for determin- 

55 jng jointly a first target bit-rate for the first data 

stream and a second target bit-rate for the second 
data stream are arranged to adjust the target bit- 
rates so that the ratio of the target bit-rates is sub- 
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stantially constant as long as the occupancy level 
of the buffer is below a certain first threshold. 

16. A scalable encoder according to any of the preced- 
ing claims, characterized in that the control means 
is arranged to receive control information (401 a) in- 
dicating a preferred combination of the first and sec- 
ond data streams. 

17. A scalable encoder according to claim 16, charac- 
terized in that said control information indicating a 
preferred combination of the first and second data 
streams is used to determine a preferred ratio of the 
target bit-rate of the first data stream and the target 
bit-rate of the second data stream. 

1 8. A scalable encoder according to any of the preced- 
ing claims, characterized in that 

it further comprises decoding means for decod- 
ing said first data stream into a decoded signal, 
and 

said second encoding means are arranged to 
encode a difference signal, which is the differ- 
ence between the media signal and the decod- 
ed signal, said second encoding means pro- 
ducing the second data stream having said sec- 
ond bit-rate. 

19. A scalable encoder according to claim 18, charac- 
terized in that the first encoding means is a speech 
encoder and the second encoding means is an au- 
dio encoder. 

20. A scalable encoder according to claim 1 9, charac- 
terized in that the speech encoder is a multi-rate 
speech encoder and the audio encoder is a variable 
rate audio encoder. 

21. A scalable encoder according to claim 1 9, charac- 
terized in that the speech encoder is a variable rate 
speech encoder and the audio encoder is a variable 
rate audio encoder. 

22. A scalable encoder according to claim 1, charac- 
terized in that the first encoding means is a base 
layer video encoding means and the second encod- 
er comprises at least one enhancement layer video 
encoding means. 

23. A scalable encoder according to any of the preced- 
ing claims, characterized in that it further compris- 
es 

third encoding means for producing a fourth da- 
ta stream, which is a core data stream corre- 
sponding to a second media signal, having a 
fourth bit-rate, and 



fourth encoding means for producing a fifth da- 
ta stream., which comprises a set of enhance- 
ment data streams corresponding to the sec- 
ond media signal, having a fifth bit-rate. 

5 

and in that the multiplexer is arranged to combine 
at least the first, the second, the fourth and the fifth 
data streams into a third data stream, and the con- 
trol means is arranged to determine a target corn- 
to bination of the first, the second, the fourth and the 
fifth data streams in the third data stream according 
to the control information and to adjust the combi- 
nation of said data streams in the third data stream 
by affecting the first, the second, the fourth and the 
15 fifth bit-rates. 

24. A multimedia terminal (20), which comprises a scal- 
able encoder comprising 

20 - first encoding means (21 0) for producing a first 
data stream, which is a core data stream relat- 
ing to the media signal, having a first bit-rate, 
second encoding means (230) for producing a 
second data stream, which comprises a set of 

25 enhancement data streams relating to the me- 

dia stream, having a second bit-rate, and 
a multiplexer (110) for combining at least the 
first data stream and the second data stream 
into a third data stream, 

30 

characterized in that it further comprises control 
means (420, 421 , 422) , which is arranged to receive 
control information (401), to determine a target 
combination of the first data stream and the second 
35 data stream in the third data stream according to 
the control information and to adjust the combina- 
tion of the first data stream and the second data 
stream in the third data stream by affecting the first 
and the second bit-rates. 

40 

25. A multimedia terminal according to claim 24, char- 
acterized in that it further comprises an input ele- 
ment (510) for inputting preference information in- 
dicating a preferred combination of the first data 

45 stream and the second data stream, said prefer- 
ence information being delivered as control infor- 
mation to the control means. 

26. A multimedia terminal according to claim 25, char- 
so acterized in that said input element constitutes a 

part of a user interface of the multimedia terminal. 

27. A multimedia terminal according to claim 26, char- 
acterized in that the user interface comprises a 

55 slide switch. 

28. A multimedia terminal according to claim 25, char- 
acterized in that said input element is arranged to 
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receive external control information. 

29. A multimedia terminal according to claim 28 : char- 
acterized in that said input element is arranged to 
receive control information from a communication 
network. 

30. A multimedia terminal according to claim 28, char- 
acterized in that said input element is arranged to 
receive control information from a second multime- 
dia terminal. 

31. A multimedia terminal (1300) according to any of 
the claims 24 to 30, characterized in that it is a 
mobile station of a mobile communication network. 

32. A multimedia terminal according to any of theclaims 
24 to 30, characterized in that it is an H.324 mul- 
timedia terminal. 

33. A method for scalable encoding of a media signal, 
which method comprises the steps of: 

encoding the media signal into a first data 
stream, which is a core data stream corre- 
sponding to the media signal, having a first bit 
rate, 

encoding the media signal into a second data 
stream, which comprises a set of enhancement 
data streams corresponding to the media sig- 
nal, having a second bit rate, and 
multiplexing at least the first data stream and 
the second data stream into a third data stream, 
characterized in that it further comprises the 
steps of: 

receiving (1001, 1003) control information, 
determining (1002, 1005, 1007) a target com- 
bination of the first data stream and the second 
data stream in the third data stream according 
to the control information, and 
adjusting (1010) the combination of the first da- 
ta stream and the second data stream in the 
third data stream by affecting the first and the 
second bit- rates. 

34. A method according to claim 33, characterized in 
that it further comprises the steps of: 

determining according to the control informa- 
tion a preferred ratio for a target bit-rate of the 
first data stream and a target bit-rate of the sec- 
ond data stream, 

determining jointly said target bit-rates, 
feeding the third data stream into a buffer, and 
determining the occupancy level of the buffer, 
and in that when the occupancy level of the 
buffer is below a certain first threshold (T 2 ), the 



ratio of said target bit-rates is substantially said 
preferred ratio. 

35. A method according to claim 34, characterized in 
5 that when the occupancy level of the buffer is below 

a certain second threshold (T.,), the target bit-rate 
for the first data stream is determined based on the 
content of the media signal. 
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